High throughput detection of molecular markers based on aflp and high through-put sequencing

ABSTRACT

The present invention relates to a high throughput method for the identification and detection of molecular markers wherein restriction fragments are generated and suitable adaptors comprising (sample-specific) identifiers are ligated. The adapter-ligated restriction fragments may be selectively amplified with adaptor compatible primers carrying selective nucleotides at their 3′ end. The amplified adapter-ligated restriction fragments are, at least partly, sequenced using high throughput sequencing methods and the sequence parts of the restriction fragments together with the sample-specific identifiers serve as molecular marker.

This application is a Continuation of U.S. patent application Ser. No.12/296,009, filed Feb. 6, 2009, which is a U.S. National StageApplication of PCT/NL2007/000094, filed Apr. 4, 2007, which claimspriority to U.S. Provisional Patent Applications 60/788,706, filed Apr.4, 2006 and 60/880,052, filed Jan. 12, 2007, all of which areincorporated herewith by reference in entirety.

FIELD OF THE INVENTION

The present invention relates to the field of molecular biology andbiotechnology. In particular, the invention relates to the field ofnucleic acid detection identification. More in particular the inventionrelates to methods for the detection and identification of markers, inparticular molecular markers. The invention is concerned with theprovision of high throughput methods for the detection andidentification of molecular markers. The invention further relates tothe application of the method in the identification of and/or detectionof nucleotide sequences that are related to a wide variety of genetictraits, genes, haplotypes and combinations thereof. The invention can beused in the field of high throughput detection and identification ofmolecular markers from any origin, be it plant, animal, human,artificial or otherwise.

BACKGROUND OF THE INVENTION

Exploration of genomic DNA has long been desired by the scientific, inparticular medical, community. Genomic DNA holds the key toidentification, diagnosis and treatment of diseases such as cancer andAlzheimer's disease. In addition to disease identification andtreatment, exploration of genomic DNA may provide significant advantagesin plant and animal breeding efforts, which may provide answers to foodand nutrition problems in the world.

Many diseases are known to be associated with specific geneticcomponents, in particular with polymorphisms in specific genes. Theidentification of polymorphisms in large samples such as genomes is atpresent a laborious and time-consuming task. However, suchidentification is of great value to areas such as biomedical research,developing pharmacy products, tissue typing, genotyping and populationstudies.

Markers, i.c. genetic markers, have been used for a very long time as agenetic typing method, i.e. to connect a phenotypic trait to thepresence, absence or amount of a particular part of DNA (gene). One ofthe most versatile genetic typing technologies is AFLP, already aroundfor many years and widely applicable to any organism (for reviews seeSavelkoul et al. J. Clin. Microbiol, 1999, 37(10), 3083-3091; Bensch etal. Molecular Ecology, 2005, 14, 2899-2914)

The AFLP technology (Zabeau & Vos, 1993; Vos et al., 1995) has foundwidespread use in plant breeding and other field since its invention inthe early nineties. This is due to several characteristics of AFLP, ofwhich the most important is that no prior sequence information is neededto generate large numbers of genetic markers in a reproducible fashion.In addition, the principle of selective amplification, a cornerstone ofAFLP, ensures that the number of amplified fragments can be brought inline with the resolution of the detection system, irrespective of genomesize or origin.

Detection of AFLP fragments is commonly carried out by electrophoresison slab-gels (Vos et al., 1995) or capillary electrophoresis (van derMeulen et al., 2002). The majority of AFLP markers scored in this wayrepresent (single nucleotide) polymorphisms occurring either in therestriction enzyme recognition sites used for AFLP template preparationor their flanking nucleotides covered by selective AFLP primers. Theremainder of the AFLP markers are insertion/deletion polymorphismsoccurring in the internal sequences of the restriction fragments and avery small fraction on single nucleotide substitutions occurring insmall restriction fragments (<approximately 100 bp), which for thesefragments cause reproducible mobility variations between both alleleswhich can be observed upon electrophoresis; these AFLP markers can bescored co-dominantly without having to rely on band intensities.

In a typical AFLP fingerprint, the AFLP markers therefore constitute theminority of amplified fragments (less than 50 percent but often lessthan 20 percent), while the remainder are commonly referred to asconstant AFLP fragments. The latter are nevertheless useful in the gelscoring procedure as they serve as anchor points to calculate fragmentsmobilities of AFLP markers and aid in quantifying the markers forco-dominant scoring. Co-dominant scoring (scoring for homo- orheterozygosity) of AFLP markers currently is restricted to the contextof fingerprinting a segregating population. In a panel of unrelatedlines, only dominant scoring is possible.

Although the throughput of AFLP is very high due to high multiplexinglevels in the amplification and detection steps, the rate limiting stepis the resolving power of electrophoresis. Electrophoresis allows uniqueidentification of the majority of amplified fragments based on thecombination of restriction enzyme combinations (EC), primer combinations(PC) and mobility, but electrophoresis is only capable to distinguishthe amplified fragments based on differences in mobility. Fragments ofsimilar mobility are often found as so-called ‘stacked bands’ and withelectrophoresis, no attention can be given to the information that iscontained in so-called ‘constant bands’, i.e. amplified restrictionfragments that do not appear to differ between compared species.Furthermore on a typical gel-based system, or on a capillary system suchas a MegaBACE, samples must be run in parallel and only about 100-150bands per lane on a gel or per capillary can be analysed. Theselimitations also hamper throughput.

Ideally, the detection system should be capable of determining theentire sequence of the amplified fragments to capture all amplifiedrestriction fragments. However, most high throughput sequencingtechnologies cannot yet provide sequencing reads that encompass entireAFLP fragments, which are typically 100-500 bp in length.

So far, detection of AFLP markers/sequences by sequencing has not beeneconomically feasible due to, among other limitations, cost limitationsof Sanger dideoxy sequencing technology and other conventionalsequencing technologies.

Detection by sequencing instead of mobility determination will increasethroughput because:

1) polymorphisms located in the internal sequences will be detected inmost (or all) amplified fragments; this will increase the number ofmarkers per PC considerably.

2) no loss of AFLP markers due to co-migration of AFLP markers andconstant bands.

3) co-dominant scoring does not rely on quantification of bandintensities and is independent of the relatedness of the individualsfingerprinted.

However, detection by sequencing of the entire restriction fragment isstill relatively uneconomical. Furthermore, the current state of the artsequencing technology such as disclosed herein elsewhere (from 454 LifeSciences, www.454.com and Solexa, www.solexa.com), despite theiroverwhelming sequencing power, can only provide sequencing fragments oflimited length. Also the current methods do not allow for thesimultaneous processing of many samples in one run.

DEFINITIONS

In the following description and examples a number of terms are used. Inorder to provide a clear and consistent understanding of thespecification and claims, including the scope to be given such terms,the following definitions are provided. Unless otherwise defined herein,all technical and scientific terms used have the same meaning ascommonly understood by one of ordinary skill in the art to which thisinvention belongs. The disclosures of all publications, patentapplications, patents and other references are incorporated herein intheir entirety by reference.

Nucleic acid: a nucleic acid according to the present invention mayinclude any polymer or oligomer of pyrimidine and purine bases,preferably cytosine, thymine, and uracil, and adenine and guanine,respectively (See Albert L. Lehninger, Principles of Biochemistry, at793-800 (Worth Pub. 1982) which is herein incorporated by reference inits entirety for all purposes). The present invention contemplates anydeoxyribonucleotide, ribonucleotide or peptide nucleic acid component,and any chemical variants thereof, such as methylated, hydroxymethylatedor glycosylated forms of these bases, and the like. The polymers oroligomers may be heterogenous or homogenous in composition, and may beisolated from naturally occurring sources or may be artificially orsynthetically produced. In addition, the nucleic acids may be DNA orRNA, or a mixture thereof, and may exist permanently or transitionallyin single-stranded or double-stranded form, including homoduplex,heteroduplex, and hybrid states.

AFLP: AFLP refers to a method for selective amplification of nucleicacids based on digesting a nucleic acid with one or more restrictionendonucleases to yield restriction fragments, ligating adaptors to therestriction fragments and amplifying the adaptor-ligated restrictionfragments with at least one primer that is (part) complementary to theadaptor, (part) complementary to the remains of the restrictionendonuclease, and that further contains at least one randomly selectednucleotide from amongst A, C, T, or G (or U as the case may be). AFLPdoes not require any prior sequence information and can be performed onany starting DNA. In general, AFLP comprises the steps of:

-   -   (a) digesting a nucleic acid, in particular a DNA or cDNA, with        one or more specific restriction endonucleases, to fragment the        DNA into a corresponding series of restriction fragments;    -   (b) ligating the restriction fragments thus obtained with a        double-stranded synthetic oligonucleotide adaptor, one end of        which is compatible with one or both of the ends of the        restriction fragments, to thereby produce adaptor-ligated,        preferably tagged, restriction fragments of the starting DNA;    -   (c) contacting the adaptor-ligated, preferably tagged,        restriction fragments under hybridizing conditions with one or        more oligonucleotide primers that contain selective nucleotides        at their 3′-end;    -   (d) amplifying the adaptor-ligated, preferably tagged,        restriction fragment hybridised with the primers by PCR or a        similar technique so as to cause further elongation of the        hybridised primers along the restriction fragments of the        starting DNA to which the primers hybridised; and    -   (e) detecting, identifying or recovering the amplified or        elongated DNA fragment thus obtained.

AFLP thus provides a reproducible subset of adaptor-ligated fragments.AFLP is described in EP 534858, U.S. Pat. No. 6,045,994 and in Vos etal. Reference is made to these publications for further detailsregarding AFLP. The AFLP is commonly used as a complexity reductiontechnique and a DNA fingerprinting technology. Within the context of theuse of AFLP as a fingerprinting technology, the concept of an AFLPmarker has been developed.

AFLP marker: An AFLP marker is an amplified adaptor-ligated restrictionfragment that is different between two samples that have been amplifiedusing AFLP (fingerprinted), using the same set of primers. As such, thepresence or absence of this amplified adaptor-ligated restrictionfragment can be used as a marker that is linked to a trait or phenotype.In conventional gel technology, an AFLP marker shows up as a band in thegel located at a certain mobility. Other electrophoretic techniques suchas capillary electrophoresis may not refer to this as a band, but theconcept remains the same, i.e. a nucleic acid with a certain length andmobility. Absence or presence of the band may be indicative of (orassociated with) the presence or absence of the phenotype. AFLP markerstypically involve SNPs in the restriction site of the endonuclease orthe selective nucleotides. Occasionally, AFLP markers may involve indelsin the restriction fragment.

Constant band: a constant band in the AFLP technology is an amplifiedadaptor-ligated restriction fragment that is relatively invariablebetween samples. Thus, a constant band in the AFLP technology will, overa range of samples, show up at about the same position in the gel, i.e.has the same length/mobility. In conventional AFLP these are typicallyused to anchor the lanes corresponding to samples on a gel orelectropherograms of multiple AFLP samples detected by capillaryelectrophoresis. Typically, a constant band is less informative than anAFLP marker. Nevertheless, as AFLP markers customary involve SNPs in theselective nucleotides or the restriction site, constant bands maycomprise SNPs in the restriction fragments themselves, rendering theconstant bands an interesting alternative source of genetic informationthat is complementary to AFLP markers.

Selective base: Located at the 3′ end of the primer that contains a partthat is complementary to the adaptor and a part that is complementary tothe remains of the restriction site, the selective base is randomlyselected from amongst A, C, T or G. By extending a primer with aselective base, the subsequent amplification will yield only areproducible subset of the adaptor-ligated restriction fragments, i.e.only the fragments that can be amplified using the primer carrying theselective base. Selective nucleotides can be added to the 3′end of theprimer in a number varying between 1 and 10. Typically 1-4 suffice. Bothprimers may contain a varying number of selective bases. With each addedselective base, the subset reduces the amount of amplifiedadaptor-ligated restriction fragments in the subset by a factor of about4. Typically, the number of selective bases used in AFLP is indicated by+N+M, wherein one primer carries N selective nucleotides and the otherprimers carries M selective nucleotides. Thus, an Eco/Mse +1/+2 AFLP isshorthand for the digestion of the starting DNA with EcoRI and MseI,ligation of appropriate adaptors and amplification with one primerdirected to the EcoRI restricted position carrying one selective baseand the other primer directed to the MseI restricted site carrying 2selective nucleotides. A primer used in AFLP that carries at least oneselective nucleotide at its 3′ end is also depicted as an AFLP-primer.Primers that do not carry a selective nucleotide at their 3′ end andwhich in fact are complementary to the adaptor and the remains of therestriction site are sometimes indicated as AFLP+0 primers.

Clustering: with the term “clustering” is meant the comparison of two ormore nucleotide sequences based on the presence of short or longstretches of identical or similar nucleotides. Several methods foralignment of nucleotide sequences are known in the art, as will befurther explained below. Sometimes the terms “assembly” or “alignment”are used as synonyms.

Identifier: a short sequence that can be added to an adaptor or a primeror included in its sequence or otherwise used as label to provide aunique identifier. Such a sequence identifier can be a unique basesequence of varying but defined length uniquely used for identifying aspecific nucleic acid sample. For instance 4 bp tags allow 4(exp4)=256different tags. Typical examples are ZIP sequences, known in the art ascommonly used tags for unique detection by hybridization (Iannone et al.Cytometry 39:131-140, 2000). Using such an identifier, the origin of aPCR sample can be determined upon further processing. In the case ofcombining processed products originating from different nucleic acidsamples, the different nucleic acid samples are generally identifiedusing different identifiers.

Sequencing: The term sequencing refers to determining the order ofnucleotides (base sequences) in a nucleic acid sample, e.g. DNA or RNA.

High-throughput screening: High-throughput screening, often abbreviatedas HTS, is a method for scientific experimentation especially relevantto the fields of biology and chemistry. Through a combination of modernrobotics and other specialised laboratory hardware, it allows aresearcher to effectively screen large amounts of samplessimultaneously.

Restriction endonuclease: a restriction endonuclease or restrictionenzyme is an enzyme that recognizes a specific nucleotide sequence(target site) in a double-stranded DNA molecule, and will cleave bothstrands of the DNA molecule at or near every target site.

Restriction fragments: the DNA molecules produced by digestion with arestriction endonuclease are referred to as restriction fragments. Anygiven genome (or nucleic acid, regardless of its origin) will bedigested by a particular restriction endonuclease into a discrete set ofrestriction fragments. The DNA fragments that result from restrictionendonuclease cleavage can be further used in a variety of techniques andcan for instance be detected by gel electrophoresis.

Gel electrophoresis: in order to detect restriction fragments, ananalytical method for fractionating DNA molecules on the basis of sizecan be required. The most commonly used technique for achieving suchfractionation is (capillary) gel electrophoresis. The rate at which DNAfragments move in such gels depends on their molecular weight; thus, thedistances traveled decrease as the fragment lengths increase. The DNAfragments fractionated by gel electrophoresis can be visualized directlyby a staining procedure e.g. silver staining or staining using ethidiumbromide, if the number of fragments included in the pattern issufficiently small. Alternatively further treatment of the DNA fragmentsmay incorporate detectable labels in the fragments, such as fluorophoresor radioactive labels, which are preferably used to label one strand ofthe AFLP product.

Ligation: the enzymatic reaction catalyzed by a ligase enzyme in whichtwo double-stranded DNA molecules are covalently joined together isreferred to as ligation. In general, both DNA strands are covalentlyjoined together, but it is also possible to prevent the ligation of oneof the two strands through chemical or enzymatic modification of one ofthe ends of the strands. In that case the covalent joining will occur inonly one of the two DNA strands.

Synthetic oligonucleotide: single-stranded DNA molecules havingpreferably from about 10 to about 50 bases, which can be synthesizedchemically are referred to as synthetic oligonucleotides. In general,these synthetic DNA molecules are designed to have a unique or desirednucleotide sequence, although it is possible to synthesize families ofmolecules having related sequences and which have different nucleotidecompositions at specific positions within the nucleotide sequence. Theterm synthetic oligonucleotide will be used to refer to DNA moleculeshaving a designed or desired nucleotide sequence.

Adaptors: short double-stranded DNA molecules with a limited number ofbase pairs, e.g. about 10 to about 30 base pairs in length, which aredesigned such that they can be ligated to the ends of restrictionfragments. Adaptors are generally composed of two syntheticoligonucleotides which have nucleotide sequences which are partiallycomplementary to each other. When mixing the two syntheticoligonucleotides in solution under appropriate conditions, they willanneal to each other forming a double-stranded structure. Afterannealing, one end of the adaptor molecule is designed such that it iscompatible with the end of a restriction fragment and can be ligatedthereto; the other end of the adaptor can be designed so that it cannotbe ligated, but this need not be the case (double ligated adaptors).

Adaptor-ligated restriction fragments: restriction fragments that havebeen capped by adaptors.

Primers: in general, the term primers refer to DNA strands which canprime the synthesis of DNA. DNA polymerase cannot synthesize DNA de novowithout primers: it can only extend an existing DNA strand in a reactionin which the complementary strand is used as a template to direct theorder of nucleotides to be assembled. We will refer to the syntheticoligonucleotide molecules which are used in a polymerase chain reaction(PCR) as primers.

DNA amplification: the term DNA amplification will be typically used todenote the in vitro synthesis of double-stranded DNA molecules usingPCR. It is noted that other amplification methods exist and they may beused in the present invention without departing from the gist.

SUMMARY OF THE INVENTION

The present inventors have found that the above described problems andother problems in the art can be overcome by devising a generic waywherein the versatility and applicability of (AFLP) marker technologycan be combined with that of state-of-the-art high throughput sequencingtechnology.

Thus, the present inventors have found that by incorporation of asample-specific identifier in the adaptor-ligated restriction fragmentand/or the determination of only part of the sequence of the restrictionfragment provides for a very efficient and reliable improvement of theexisting technologies. It was found that by incorporation of asample-specific identifier, multiple samples can be sequenced in asingle run and by sequencing only part of the restriction fragment,adequate identification of the restriction fragment can be achieved.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1: is a schematic representation of the adaptor structure that isused in a regular AFLP-based approach for AFLP detection short tagsequencing. A typical AFLP fragment derived form a digest of a DNAsample with EcoRI and MseI and subsequent adapter ligation is shown,followed by a typical adaptor for the EcoRI site. The adaptor comprises,from the 5′ to 3′ end, a 5′ primer sequence, which is optional, and canbe used to anchor amplification primers or to anchor the adapter-ligatedfragment to a bead or surface. Further an identifier is shown (given asNNNNNN in a degenerate form), followed by remains of a recognitionsequence of a restriction enzyme (in this EcoRI, i.e. AATTC). The lastnucleotide of the identifier preferably does not comprise a G in orderto destroy the EcoRI restriction site. A suitable primer is providedthat comprises the optional 5′ primer sequence, an example of a specificprimer (ACTGAC), remains of the recognition site and a section that maycontain one or more selective nucleotides at the 3′ end.

FIG. 2: is a schematic representation of the embodiment wherein arecognition sequence for a type IIs restriction endonuclease isincorporated in the adaptor. After restriction with the type IIs enzyme,type IIs compatible adaptors can be ligated to one or both of therestricted fragments A and B. The type IIs adaptor comprises an optionalprimer binding (or anchoring) sequence, an identifier and a sectioncontaining (degenerate) nucleotides (NN) to hybridize to the overhang ofthe IIs restriction site. The associated primer may contain one or moreselective nucleotides (XYZ) at its 3′ end.

DETAILED DESCRIPTION OF THE INVENTION

In one aspect, the invention relates to a method for the identificationof restriction fragments in a sample, comprising the steps of:

-   -   (a) providing a sample nucleic acid;    -   (b) digesting the sample nucleic acid with at least one        restriction endonuclease to obtain a set of restriction        fragments;    -   (c) providing double stranded synthetic adaptors comprising        -   a 5′ primer-compatible sequence,        -   a sample-specific identifier section,        -   a section that is complementary to the remains of the            recognition sequence of the restriction endonuclease;    -   (d) ligating the double stranded synthetic adaptors to the        restriction fragments in the set, to provide a set of        adaptor-ligated restriction fragments;    -   (e) amplification of the set of adaptor-ligated restriction        fragments, with one or more primers that are at least        complementary to:        -   the sample-specific identifier section,        -   the section that is complementary to the remains of the            recognition sequence of the restriction endonuclease,    -    to provide for amplified adaptor-ligated restriction fragments        (amplicons);    -   (f) determining the sequence of at least the sample-specific        identifier section, the remains of the recognition sequence of        the restriction endonuclease and of part of the sequence of the        restriction fragment located adjacent thereto of (part of) the        amplified adaptor-ligated restriction fragments.    -   (g) identifying the presence or absence of amplified        adaptor-ligated restriction fragments in the sample.

By treating a sample nucleic acid in this way, a set of amplifiedrestriction fragments is obtained for every sample that is sequenced.Every restriction fragment can be identified as originating from acertain sample via the sample specific identifier which is different foreach sample. Sequencing of the amplified adaptor-ligated restrictionfragments provides sequence information on at least part of theadaptor-ligated restriction fragment. The information contained in theadaptor-derived part contains information about the sample from whichthe fragment is obtained, whereas sequence information from therestriction fragment itself provides information about the restrictionfragment and allows for identification of the restriction fragment. Thissequence information on the restriction fragment is used to identify therestriction fragment with an accuracy that depends on the number ofnucleotides that is determined and the number of restriction fragmentsin the set of amplified adaptor-ligated restriction fragments.

To provide a solution to the problem of sampling variation which affectsthe accuracy of identifying molecular markers by sequencing contained ina set of multiple fragments, the present inventors have also found thatdetection of markers via sequencing is preferably performed withsufficient redundancy (depth) to sample all amplified fragments at leastonce and accompanied by statistical means which address the issue ofsampling variation in relation to the accuracy of the genotypes called.Furthermore, just as with AFLP scoring, in the context of a segregatingpopulation, the simultaneous scoring of the parent individuals in oneexperiment, will aid in determining the statistical threshold.

Thus, in certain embodiments, the redundancy of the tagged amplifiedadaptor-ligated restriction fragments is at least 6, preferably at least7, more preferably at least 8 and most preferably at least 9. In certainembodiments, the sequence of each adaptor-ligated restriction fragmentis determined at least 6, preferably at least 7, more preferably atleast 8 and most preferably at least 9 fold. In certain embodiments, theredundancy is selected such, assuming a 50/50 overall chance ofidentifying the locus correctly as homozygous, that the chance ofcorrect identification of the locus is more than 95%, 96%, 97%, 98%,99%, 99.5%.

In this respect the following calculation may be illustrative: Thesequencing technology of Solexa as described herein elsewhere, providesfor 40.000.000 reads of about 25 bp each, totaling a staggering 1billion by in one single run. Assuming a redundancy in sampling of 10times, 4.000.000 unique fragments can be assessed in one run. Combining100 samples allows for 40.000 fragments to be sequences for each sample.Seen from the perspective of AFLP, this amounts to 160 primercombinations with 250 fragments each.

This method allows for the identification of restriction fragments inway that is different from that of the conventional marker detectionbased on electrophoresis.

In the first step of the method for the identification of restrictionfragments a sample nucleic acid is provided. The nucleic acids in thesample will usually be in the form of DNA. However, the nucleotidesequence information contained in the sample may be from any source ofnucleic acids, including e.g. RNA, polyA+RNA, cDNA, genomic DNA,organellar DNA such as mitochondrial or chloroplast DNA, syntheticnucleic acids, DNA libraries (such as BAC libraries/pooled BAC clones),clone banks or any selection or combinations thereof. The DNA in thenucleic acid sample may be double stranded, single stranded, and doublestranded DNA denatured into single stranded DNA. The DNA sample can befrom any organism, whether plant, animal, synthetic or human.

The nucleic acid sample is restricted (or digested) with at least onerestriction endonuclease to provide for a set of restriction fragments.In certain embodiments, two or more endonucleases can be used to obtainrestriction fragments. The endonuclease can be a frequent cutter (arecognition sequence of 3-5 bp, such as MseI) or a rare cutter(recognition sequence of >5 bp, such as EcoRI). In certain preferredembodiments, a combination of a rare and a frequent cutter is preferred.In certain embodiments, in particular when the sample contains or isderived from a relative large genome, it may be preferred to use a thirdenzyme (rare or frequent cutter) to obtain a larger set of restrictionfragments of shorter size.

As restriction endonucleases, any endonuclease will suffice. Typically,Type II endonucleases are preferred such as EcoRI, MseI, PstI etc. Incertain embodiments a type IIs endonuclease may be used, i.e. anendonuclease of which the recognition sequence is located distant fromthe restriction site, i.e such as AceIII, AlwI, AlwXI, Alw26I, BbvI,BbvII, BbsI, BccI, Bce83I, BcefI, BcgI, BinI, BsaI, BsgI, BsmAI, BsmFI,BspMI, EarI, EciI, Eco3II, Eco57I, Esp3I, FauI, FokI, GsuI, HgaI,HinGUII, HphI, Ksp632I, MboII, MmeI, MnII, NgoVIII, PleI, RIeAI, SapI,SfaNI, TaqJI and ZthII III. The use of this type of restrictionendonuclease leads to certain adaptations to the method as will bedescribed herein elsewhere.

Restriction fragments can be blunt-ended or have protruding ends,depending on the endonuclease used. To these ends, adaptors can beligated. Typically, the adaptors used in the present invention have aparticular design. The adaptors used in the present invention maycomprise a 5′-primer compatible sequence, which may be optional toprovide for sufficient length of the adaptor for subsequent primerannealing, followed by a sample-specific identifier section that maycomprise from 4-16 nucleotides. Preferably the sample-specificidentifier does not contain 2 or more consecutive identical bases toprevent readthroughs during the sequencing step. Furthermore, in case 2or more sample are combined and multiple sample specific identifiers areused to distinguish the origin of the samples, there is preferably adifference between the sample-specific identifiers of at least 2,preferably 3 bp. This allows for improved discrimination between thedifferent sample-specific identifiers within a combined pool of samples.At the 3′end of the adaptor a section is located that is complementaryto the remains of the recognition sequence of the restrictionendonuclease. For instance, EcoRI recognises 5′-GAATTC-3′ and cutsbetween G and AATTC. For EcoRI, the section complementary to the remainsof the recognition sequence of the restriction endonuclease hence is aC-nucleotide.

The adaptor is ligated (covalently connected) with one or both sides ofthe restriction fragment. When digestion is performed with more than oneendonuclease, different adaptors may be used which will give rise todifferent sets of adaptor-ligated restriction fragments.

The adaptor-ligated restriction fragments are subsequently amplifiedwith a set of one or more primers. The primer may be complementary tothe adaptor only, i.e. non-selective amplification. The primerpreferably contains a section that is complementary to thesample-specific identifier and a section that is complementary to theremains of the recognition sequence of the restriction endonuclease. Incertain embodiments, the primer may contain at its 3′end one or moreselective nucleotides to provide for a subset of amplifiedadapter-ligated restriction fragments. The primer may at its 5′end alsocontain further nucleotides to aid in anchoring the primer to theadapter-ligated restriction fragments. In certain embodiments, theprimer may contain nucleotides that express improved hybridisationcharacteristics such as LNAs or PNAs. To amplify adapter-ligatedrestriction fragments from combined samples in a pool it is possible touse sets of degenerated primers, i.e. primer sets wherein for eachsample, the corresponding sample-identifier is incorporated in theprimer. In certain embodiments, it is possible to use primer setswherein the identifier section is completely degenerated (or at least toa large extent) i.e. (almost) every combination of nucleotides isprovided in the sample specific identifier section. Combined withstringent hybridisation conditions in the amplification and the optionaluse of LNA or PNA-type nucleotides to increase hybridisationcharacteristics, this may lead to a very efficient amplification.

The amplification of the adapter-ligated restriction fragments lead to aset of amplified adapter-ligated restriction fragments, sometimesreferred to as amplicons.

The amplicons (or at least part thereof) are subjected to a step thatcomprises at least the determination of the sequence of the samplespecific identifier to determine the origin of the fragment and of partof the sequence of the restriction fragment. In practice this amountsalso to the determination of the sections located in-between such as theremains of the recognition sequence of the restriction endonuclease. Bysequencing the sample specific identifier in combination with part ofthe fragment located adjacent to the adapter derived sequence, it ispossible to uniquely identify restriction fragments. When correlated tothe presence or absence of a phenotype, these uniquely identifiedrestriction fragments can be used as molecular markers. This allows forthe definition of a new generation of markers and amounts hence to anovel marker technology with the proven versatility of AFLP technology,yet that is suitable for high-throughput technologies and is generallyapplicable amongst any type of organism or nucleic acid. Uniquelyidentifying restriction fragments in a sample by determination of partof their sequence by this method can be repeated for multiple samples.The presence or absence of the restriction fragments with the depictedsequence in the sample is indicative for the presence or absence of aphenotype.

A further advantage of the presently invented marker technology based onthe combination of AFLP and high throughput sequencing is the additionalinformation that can be obtained compared to conventional AFLPtechnology. In AFLP, amplicons that are designated as AFLP markerstypically contain polymorphism in the recognition site, the restrictionsite or, optionally, in the selective nucleotides. Polymorphisms locatedfurther in the restriction fragment typical do not qualify as AFLPmarkers (apart from perhaps indel polymorphisms). With the presentsequencing step, the nucleotides adjacent to the optional selectivenucleotides are also determined and this leads to the identification ofan increased number of molecular markers and to an improvement in theexisting marker technology.

The high throughput sequencing used in the present invention is a methodfor scientific experimentation especially relevant to the fields ofbiology and chemistry. Through a combination of modern robotics andother specialised laboratory hardware, it allows a researcher toeffectively screen large amounts of samples simultaneously.

It is preferred that the sequencing is performed using high-throughputsequencing methods, such as the methods disclosed in WO 03/004690, WO03/054142, WO 2004/069849, WO 2004/070005, WO 2004/070007, and WO2005/003375 (all in the name of 454 Life Sciences), by Seo et al. (2004)Proc. Natl. Acad. Sci. USA 101:5488-93, and technologies of Helios,Solexa, US Genomics, etcetera, which are herein incorporated byreference.

454 Life Sciences Technology

In certain embodiments, it is preferred that sequencing is performedusing the apparatus and/or method disclosed in WO 03/004690, WO03/054142, WO 2004/069849, WO 2004/070005, WO 2004/070007, and WO2005/003375 (all in the name of 454 Life Sciences), which are hereinincorporated by reference. The technology described allows sequencing of40 million bases in a single run and is 100 times faster and cheaperthan competing technology. The sequencing technology roughly consists of5 steps: 1) fragmentation of DNA and ligation of specific adaptors tocreate a library of single-stranded DNA (ssDNA); 2) annealing of ssDNAto beads, emulsification of the beads in water-in-oil microreactors andperforming emulsion PCR to amplify the individual ssDNA molecules onbeads; 3) selection of /enrichment for beads containing amplified ssDNAmolecules on their surface 4) deposition of DNA carrying beads in aPicoTiter™Plate; and 5) simultaneous sequencing in 100,000 wells bygeneration of a pyrophosphate light signal. The method will be explainedin more detail below.

In a preferred embodiment, the sequencing comprises the steps of:

-   -   (a) annealing adapted fragments to beads, each bead being        annealed with a single adapted fragment;    -   (b) emulsifying the beads in water-in-oil microreactors, each        water-in-oil microreactor comprising a single bead;    -   (c) loading the beads in wells, each well comprising a single        bead; and generating a pyrophosphate signal.

In the first step (a), sequencing adaptors are ligated to fragmentswithin the combination library. Said sequencing adaptor includes atleast a “key” region for annealing to a bead, a sequencing primer regionand a PCR primer region. Thus, adapted fragments are obtained.

In a first step, adapted fragments are annealed to beads, each beadannealing with a single adapted fragment. To the pool of adaptedfragments, beads are added in excess as to ensure annealing of onesingle adapted fragment per bead for the majority of the beads (Poissondistribution).

In a next step, the beads are emulsified in water-in-oil microreactors,each water-in-oil microreactor comprising a single bead. PCR reagentsare present in the water-in-oil microreactors allowing a PCR reaction totake place within the microreactors. Subsequently, the microreactors arebroken, and the beads comprising DNA (DNA positive beads) are enriched.

In a following step, the beads are loaded in wells, each well comprisinga single bead. The wells are preferably part of a PicoTiter™Plateallowing for simultaneous sequencing of a large amount of fragments.

After addition of enzyme-carrying beads, the sequence of the fragmentsis determined using pyrosequencing. In successive steps, thePicoTiter™Plate and the beads as well as the enzyme beads therein aresubjected to different deoxyribonucleotides in the presence ofconventional sequencing reagents, and upon incorporation of adeoxyribonucleotide a light signal is generated which is recorded.Incorporation of the correct nucleotide will generate a pyrosequencingsignal which can be detected.

Pyrosequencing itself is known in the art and described inter alia onwww.biotagebio.com; www.pyrosequencing.com/section technology. Thetechnology is further applied in e.g. WO 03/004690, WO 03/054142, WO2004/069849, WO 2004/070005, WO 2004/070007, and WO 2005/003375 (all inthe name of 454 Life Sciences), which are herein incorporated byreference. In the present invention, the beads are preferably equippedwith primer (binding) sequences or parts thereof that are capable ofbinding the amplicons, as the case may be. In other embodiments, theprimers used in the amplification are equipped with sequences, forinstance at their 5′-end, that allow binding of the amplicons to thebeads in order to allow subsequent emulsion polymerisation followed bysequencing. Alternatively the amplicons may be ligated with sequencingadaptors prior to ligation to the beads or the surface. The sequencedamplicons will reveal the identity of the identifier and thus of thepresence or absence of the restriction fragment in the sample.

Solexa Technologies

One of the methods for high throughput sequencing is available fromSolexa, United Kingdom (www.solexa.co.uk) and described inter alia inWO0006770, WO0027521, WO0058507, WO0123610, WO0157248, WO0157249,WO02061127, WO03016565, WO03048387, WO2004018497, WO2004018493,WO2004050915, WO2004076692, WO2005021786, WO2005047301, WO2005065814,WO2005068656, WO2005068089, WO2005078130. In essence, the method startwith adaptor-ligated fragments of genomic DNA. The adaptor-ligated DNAis randomly attached to a dense lawn of primers that are attached to asolid surface, typically in a flow cell. The other end of the adaptorligated fragment hybridizes to a complementary primer on the surface.The primers are extended in the presence of nucleotides and polymerasesin a so-called solid-phase bridge amplification to provide doublestranded fragments. This solid phase bridge amplification may be aselective amplification. Denaturation and repetition of the solid-phasebridge amplification results in dense clusters of amplified fragmentsdistributed over the surface. The sequencing is initiated by adding fourdifferently labelled reversible terminator nucleotides, primers andpolymerase to the flow cell. After the first round of primer extension,the labels are detected, the identity of the first incorporated bases isrecorded and the blocked 3′ terminus and the fluorophore are removedfrom the incorporated base. Then the identity of the second base isdetermined in the same way and so sequencing continues.

In the present invention, the adaptor ligated restriction fragments orthe amplicons are bound to the surface via the primer binding sequenceor the primer sequence. The sequence is determined as outlined,including the identifier sequence and (part of) the restrictionfragment. Currently available Solexa technology allows for thesequencing of fragments of about 25 base pairs. By economical design ofthe adaptors and the surface bound primers, the sequencing step readsthrough the sample identifier, the remains of the recognition sequenceof the restriction endonuclease and any optional selective bases. When a6 bp sample identifier is used, the remains are from the rare cutterEcoRI (AACCT), the use of two selective bases yields an internalsequence of the restriction fragment of 12 bp that can be used touniquely identify the restriction fragment in the sample.

In a preferred embodiment based on the Solexa sequencing technologyabove, the amplification of the adapter ligated restriction fragments isperformed with a primer that contains at most one selective nucleotideat its 3′end, preferably no selective nucleotides at its 3′ end, i.e.the primer is only complementary to the adaptor (a+0 primer).

In alternative embodiments directed to the sequencing methods describedherein, the primers used in the amplification may contain specificsections (as alternative to the herein described primer or primerbinding sequences) that are used in the subsequent sequencing step tobind the adaptor-capped restriction fragments or amplicons to thesurface. These are generally depicted as the key region or the 5′-primercompatible sequence.

In one embodiment of the invention, the nucleic acid sample is digestedwith at least one restriction enzyme and at least one adapter is ligatedthat comprises a recognition sequence for a type IIs restrictionendonuclease. The subsequent digestion of the adapter-ligatedrestriction fragment with a type IIs restriction endonuclease yields, asthe distance between the recognition and restriction site of a type IIsenzyme is relatively, short (tip to about 30 nucleotides), a shorter anda longer restriction fragment, to which a IIs restriction sitecompatible adaptor can be ligated. Typically, the overhang of theIIs-restricted site is unknown such that a set of adaptors may be usedthat are degenerated in the overhang. After (selective) amplification,the amplicons can be sequenced. The adaptor sequence in this embodimentgenerally follows: 5′-primer binding site—sample identifiersequence—degenerate type IIs cohesive end sequence-3′. The associatedPCR primer generally follows: primer sequence—sample identifiersequence—degenerate type IIs cohesive end sequence—selectivenucleotides-3′. The primer used to initiate the sequencing-by-synthesisthen generally has the structure: 5′-primer binding site-3′. A sizeselection step may be preferred after digesting with the IIs enzyme toremove the smaller fragments. As in this embodiment the remains of therestriction site are for this type of enzyme typically in the order of2-4 bp, this results in combination with a 6 bp sample identifier in thesequencing of 15-17 bp of a restriction fragment.

In a further aspect, the invention relates to kits comprising one ormore primer, and/or one or more adaptors for use in the method, asidefrom conventional components for kits per se. Furthermore the presentinvention finds application in, amongst others, use of the method forthe identification of molecular markers, for genotyping, bulk segregantanalysis, genetic mapping, marker-assisted back-crossing, mapping ofquantitative trait loci, linkage disequilibrium mapping.

EXAMPLE

DNA was isolated from 2 parents and 88 offspring using conventionalmethods. Parents (2×) and offspring (=4×) were in duplex with differentindices to test reproducibility. Tags used to distinguish samples fromeach other differed at least in 2 nucleotides from any other tag used inthe experiments. Quality is being tested throughout the various stepsusing agarose and PAA gels.

Example 1

For each DNA sample a restriction-ligation step is performed using EcoRIand MseI as enzymes. Adaptors are based on the hybridizing sequenceslocated on the surface of the Solexa high throughput sequencing system,more in particular the EcoRI adapter contains the P5 sequence (sequenceprimer part) and the MseI adaptor contains the P7 sequence (bridge PCRprimer sequence). The EcoRI adaptor further contains the sampleidentifying tag. 96 different EcoRI adaptors and one MseI adaptor areused. It is possible to use a degenerated EcoRI adaptor. The templatepreparation is inclusive of a size selection step by incubation of themixture for 10 minutes at 80 degrees Celsius after the restriction(EcoRI+MseI) step but prior to the adapter ligation step. Fragmentssmaller than 130 nt are removed (in a maize sample).

The complexity of the mixture is reduced by a selective preamplificationusing +1 primers (i.e. containing one randomly selective nucleotide atthe 3′ end, using 96 EcoRI+1 primers and one MseI+1 primer (or onetag-degenerated EcoRI+1 primer and one MseI+1 primer). Selectiveamplification to reduce the complexity of the mixture to the desiredsize is performed using EcoRI+2 (=P5 side) and MseI+3 (=P7 side) primersnecessitating the use of 96 EcoRI+2 primers and one MseI+3 primer. TailPCR is performed using an EcoRI primer with the P5 bridge PCR primersequence as the tail. The products are purified using Sephadex™ columns.Concentrations are determined and normalised and pools are created. Thepools are subjected to massive parallel sequencing based on Solexatechnology comprising bridge PCR amplification and sequencing followedby data analysis to determine the genotypes of the parents and theoffspring.

An alternative scenario does not use tail PCR, but employsphosphorylated EcoRI+2 primers. Due to the mismatch with the originaladaptor, the annealing temperature in the amplification profile islowered by 3 degrees Celsius to 13 cycles touch-down from 62-53 degreesCelsius followed by 23 cycles at 53 degrees Celsius. After ligation ofthe adaptor with the P5 bridging PCR sequence, PCR is performed with P5and P7 bridge PCR primers.

A second alternative scenario is based on standard template preparationas outlined herein before, selective (pre)amplification to reduce thecomplexity. Selective amplification is performed with primers thatcontain the reconstituted EcoRI and MseI restriction sites. This allowsfor removal of the adaptor sequences prior to sequencing, therebyreducing the amount of data to be analysed. Purification of the productsby Sephadex™ columns to remove remains of Taq DNA polymerase. Templatepreparation wherein (reconstituted site) adapter sequences are replacedby Solexa adaptors using ten-fold increased EcoRI adaptor and EcoRIenzyme to compensate for the increased number of EcoRI sites compared togenomic DNA. The Solexa EcoRI adaptors also contain the tags, hence 96tagged Solexa EcoRI adaptors are needed. The bottom strand of theadaptor is blocked at the 3′end (in this case by 3′amino) to blockextension by a polymerase. PCR is performed with P5 and P7 bridge PCRprimers. Products are purified by Qiagen columns.

Example 2

Sequence-based detection of AFLP fragments was performed using Solexa'sClonal Single Molecule Array (CSMATM) technology, aSequencing-by-Synthesis platform capable of analyzing up to 40 millionindividual fragments in a single sequence run.

The experimental sequence involves AFLP template preparation, selective(AFLP) amplification, single molecule bridge amplification andsequencing of millions of sequence tags from one restriction enzyme endof the AFLP fragments. Maize parental lines B73 and Mo17 and 87Recombinant Inbred Lines (RILs) were used and sequenced over 8.9 millionEcoRI AFLP fragment termini were sequenced to provide proof-of-principlefor sequence-based AFLP detection.

Parental lines B73 and Mo17 and 87 RILs were selected. AFLP templateswere prepared using restriction enzyme combination EcoRI/MseI. Selectiveamplification was performed using +21+3 AFLP primers.

Template fragments for Solexa CSMA bridge amplification were prepared byperforming a second restriction/ligation using EcoRI adaptors containingunique 5 bp sample identification (ID) tag sequences. Parental lines andthree RIL samples were included twice using different 5 bp sample IDtags to measure within-experiment reproducibility.

Sequence-based AFLP markers were identified by extracting 27 bp sequencetags observed at different frequencies in B73 and Mo17, segregating inthe RIL offspring.

Sequence-based AFLP marker data were compared to AFLP marker scoresobtained by conventional AFLP fingerprinting using length-baseddetection of the four corresponding EcoRI/MseI+3/+3 primer combinations.

Sequence Run Statistics 5 Flow Cells

# sequence tags generated 8,941,407 # sequence tags with known sampleIDs 8,029,595 # different sequence tags with known sample IDs 206,758 #Mbp sequence data generated 241.4 frequency range total # sequence tagsper sample 55,374-112,527 # sequence tag AFLP markers 125 frequencyrange sequence tag AFLP markers in    90-17,218 parent scoring present

Sequence Tag AFLP Marker Definition and Scoring

-   -   tabulate sequence tags representation per sample    -   remove sequence tags with unknown sample IDs    -   normalize sample representation based on total sequence tags per        sample    -   remove sequence tags with >2 fold frequency difference in        parental duplos    -   average tag frequencies parental duplos    -   define sequence tag AFLP marker if frequency P1/P2 exceeds        threshold value    -   score presence/absence of sequence tag markers in RIL offspring

AFLP Marker Distribution AFLP+31+3: Sequence/Gel-Based

EcoRI + 3 base total +A +C +G +T # sequence tag AFLP markers 125 34 3737 17 # gel-based AFLP markers  82 29 18 17 18

Reproducibility Sequence Tag AFLP Marker Duplos 3 RIL Offspring

# sequence tag AFLP markers scored 125 # number of data-points incomparison 375 # data-points identical for duplos 372 % concordancywithin experiment duplos 99.2%

Conventional Slab Gel Detection:

AFLP marker B73 Mo17 1 2 3 4 5 6 7 8 9 10 11 12 E36/M50-175.9 − + + − −− − + − + − − − + E36/M50-280 + − + − − + − + + − + − − − E36/M50-405.8− + + − − − + + + + − + − + E36/M50-243.7 + − + − − − − − + + + + + +E36/M50-124.02 + − + − + + + + − − − − + + E36/M50-379 + − + −− + + + + + + + − + E36/M50-468.9 + − + − + + − + − + + + + +

Solexa-Based Detection

AFLP marker B73 Mo17 1 2 3 4 5 6 7 8 9 10 11 12 CGGCGACGTACCGC − + + − −− − + − + − − − + CTAGTAATTATTCC + − + − − + − + + − + − − −CAGCGCCTTCTCCT − + + − − − + + + + − + − + CAGAACTCTGACTT + − + − − − −− + + + + + + CAAATCTGTTAGAT + − + − + + + + − + − − + +CATGAAGGATTTAT + − + − − + + + + + + + − + CAAACAGACAACCG + − + − + +− + − + + + + +

The viability sequenced-based AFLP marker detection was generated usingSolexa's CSMA technology. whereby a larger number of AFLP markers isscored using sequence-based detection than on conventional slab gels,presumably due to improved resolution (fragment size) and deepsequencing which also captures low abundance fragments. Marker datavector comparisons reveal similar segregation patterns betweensequence-based detection and slab gel detection: proof of concordancyawaits sequencing gel-based AFLP markers.

1. A method for the identification of restriction fragments in aplurality of samples, comprising the steps of: (a) providing a pluralityof sample nucleic acids; (b) digesting each sample nucleic acid with atleast one restriction endonuclease to obtain a set of restrictionfragments; (c) providing double stranded synthetic adaptors comprising aprimer compatible sequence and a sample-specific identifier section; (d)ligating the double stranded synthetic adaptors to the restrictionfragments in the set, to provide a set of adaptor-ligated restrictionfragments; (e) amplifying of the set of adaptor-ligated restrictionfragments, with one or more primers that are complementary to at least aportion of the adapter to provide for amplified adaptor-ligatedrestriction fragments (amplicons); (f) determining the sequence of atleast the sample-specific identifier section, and part of the sequenceof the restriction fragment of the amplified adaptor-ligated restrictionfragments; (g) identifying the presence or absence of amplifiedadaptor-ligated restriction fragments or fragment sequences in thesamples; and (h) comparing the identified fragments or fragmentsequences between samples.
 2. The method according to claim 1, whereinthe restriction fragments are molecular markers.
 3. The method accordingto claim 2, wherein the molecular markers are AFLP markers.
 4. Themethod according to claim 1, wherein two or more samples are comparedfor the presence or absence of restriction fragments or fragmentsequences and/or molecular markers.
 5. The method according to claim 1,wherein two or more samples are combined in a pool after the step ofligating the adaptors to provide for pooled adaptor-ligated restrictionfragments.
 6. The method according to claim 5, wherein for each samplein the pool a sample-specific identifier is used that differs from theother sample-specific identifiers in the pool.
 7. The method accordingto claim 1, wherein the primers contain one or more selectivenucleotides at the 3′end.
 8. The method according to claim 1, whereinthe restriction endonuclease is a type II restriction endonuclease. 9.The method according to claim 1, wherein the restriction endonuclease isa type IIs restriction endonuclease.
 10. The method according to claim1, wherein two or more restriction endonucleases are used.
 11. Themethod according to claim 1, wherein the sequencing is carried out bymeans of high-throughput sequencing.
 12. The method according to claim8, wherein the high-throughput sequencing is performed on a solidsupport.
 13. The method according to claim 8, wherein thehigh-throughput sequencing is based on Sequencing-by-Synthesis.
 14. Themethod according to claim 8, wherein the high-throughput sequencingcomprises the steps of: annealing the amplicons or adapter-ligatedrestriction fragments to beads, each bead annealing with a singleadapter-ligated restriction fragments or amplicon; emulsifying the beadsin water-in-oil micro reactors, each water-in-oil micro reactorcomprising a single bead; performing emulsion PCR to amplifyadapter-ligated restriction fragments or amplicons on the surface ofbeads; optionally, selecting/enriching beads containing amplifiedamplicons; loading the beads in wells, each well comprising a singlebead; and determining the nucleotide sequence of the amplifiedadapter-ligated restriction fragments or amplified amplicons usinggenerating a pyrophosphate signal.
 15. The method according to claim 8,wherein the high-throughput sequencing comprises the steps of: annealingthe adapter-ligated restriction fragments or amplicons to a surfacecontaining first and second primers or first and second primer bindingsequences respectively, performing bridge amplification to provideclusters of amplified adapter-ligated restriction fragments or amplifiedamplicons, determining the nucleotide sequence of the amplifiedadapter-ligated restriction fragments or amplified amplicons usinglabeled reversible terminator nucleotides.
 16. The method according toclaim 1, wherein the identifier is from 4-16 bp.
 17. The methodaccording to claim 13, wherein the identifier does not contain 2 or moreidentical consecutive bases.
 18. The method according to claim 13,wherein for two or more samples, the corresponding identifiers containat least two different nucleotides.
 19. A method for the identificationof molecular markers for genotyping, bulk segregant analysis, geneticmapping, marker-assisted back-crossing, mapping of quantitative traitloci, or linkage disequilibrium mapping, comprising the steps of: (a)providing a plurality of sample nucleic acids; (b) digesting each samplenucleic acid with at least one restriction endonuclease to obtain a setof restriction fragments; (c) providing double stranded syntheticadaptors comprising a primer-compatible sequence and a sample-specificidentifier section; (d) ligating the double stranded synthetic adaptorsto the restriction fragments in the set, to provide a set ofadaptor-ligated restriction fragments; (e) amplifying the set ofadaptor-ligated restriction fragments, with one or more primers that arecomplementary to at least a portion of the adaptor to provide foramplified adaptor-ligated restriction fragments (amplicons); (f)determining the sequence of at least the sample-specific identifiersection and part of the sequence of the restriction fragment of theamplified adaptor-ligated restriction fragments; and (g) identifying thepresence or absence of amplified adaptor-ligated restriction fragmentsor fragment sequences in the sample; and (h) comparing the identifiedfragments or fragment sequences between samples.
 20. A kit comprisingone or more primers as defined in claim
 1. 21. A kit comprising one ormore adaptors as defined in claim
 1. 22. A kit comprising primers andadaptors as defined in claim 1.